Defines the number of training examples a mini-batch uses during an iteration of the training model to estimate the error gradient before updating the model weights. **Batch size** defines the batch size used per a single GPU.

During model training, the training data is packed into mini-batches of a fixed size.